Picture for Wendi Li

Wendi Li

Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

Add code
May 29, 2026
Viaarxiv icon

Cyclical Entropy Eruption: Entropy Dynamics in Agent Reinforcement Learning

Add code
May 27, 2026
Viaarxiv icon

LAD: Learning Advantage Distribution for Reasoning

Add code
Feb 23, 2026
Viaarxiv icon

Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents

Add code
Feb 04, 2026
Viaarxiv icon

LLM-based Human-like Traffic Simulation for Self-driving Tests

Add code
Aug 23, 2025
Viaarxiv icon

Process Reinforcement through Implicit Rewards

Add code
Feb 03, 2025
Figure 1 for Process Reinforcement through Implicit Rewards
Figure 2 for Process Reinforcement through Implicit Rewards
Figure 3 for Process Reinforcement through Implicit Rewards
Figure 4 for Process Reinforcement through Implicit Rewards
Viaarxiv icon

Free Process Rewards without Process Labels

Add code
Dec 02, 2024
Figure 1 for Free Process Rewards without Process Labels
Figure 2 for Free Process Rewards without Process Labels
Figure 3 for Free Process Rewards without Process Labels
Figure 4 for Free Process Rewards without Process Labels
Viaarxiv icon

FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks

Add code
Oct 28, 2024
Figure 1 for FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks
Figure 2 for FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks
Figure 3 for FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks
Figure 4 for FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks
Viaarxiv icon

Process Reward Model with Q-Value Rankings

Add code
Oct 15, 2024
Figure 1 for Process Reward Model with Q-Value Rankings
Figure 2 for Process Reward Model with Q-Value Rankings
Figure 3 for Process Reward Model with Q-Value Rankings
Figure 4 for Process Reward Model with Q-Value Rankings
Viaarxiv icon

Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue

Add code
Jun 04, 2024
Figure 1 for Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue
Figure 2 for Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue
Figure 3 for Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue
Figure 4 for Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue
Viaarxiv icon